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CONVERGENCE ANALYSIS OF A PROXIMAL POINT ALGORITHM 
FOR MINIMIZING DIFFERENCES OF FUNCTIONS 


Nguyen Thai An 1 , Nguyen Mau Nam 2 , 

Abstract. Several optimization schemes have been known for convex optimization problems. How¬ 
ever, numerical algorithms for solving nonconvex optimization problems are still underdeveloped. 
A progress to go beyond convexity was made by considering the class of functions representable as 
differences of convex functions. In this paper, we introduce a generalized proximal point algorithm 
to minimize the difference of a nonconvex function and a convex function. We also study conver¬ 
gence results of this algorithm under the main assumption that the objective function satisfies the 
Kurdyka - Lojasiewicz property. 
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1 Introduction 

In this paper, we introduce and study the convergence analysis of an algorithm for solving 
optimization problems in which the objective functions can be represented as differences 
of nonconvex and convex functions. The structure of the problem under consideration is 
flexible enough to include the problem of minimizing a smooth function on a closed set 
or minimizing a DC function, where DC stands for Difference of Convex functions. It is 
worth noting that DC programming is one of the most successful approaches to go beyond 
convexity. The class of DC functions is closed under many operations usually considered 
in optimization and is quite large to contain many objective functions in applications of 
optimization. Moreover, this class of functions possesses beautiful generalized differentiation 
properties and is favorable for applying numerical optimization schemes; see [1-3] and the 
references therein. 

A pioneer in this research direction is Pham Dinh Tao who introduced a simple algorithm 
called the (DCA) based on generalized differentiation of the functions involved as well as 
their Fenchel conjugates [4]. Over the past three decades, Pham Dinh Tao, Le Thi Hoai An 
and many others have contributed to providing mathematical foundation for the algorithm 
and making it accessible for applications. The (DCA) nowadays becomes a classical tool in 
the field of optimization due to several key features including simplicity, inexpensiveness, 
flexibility and efficiency; see [5-8]. 
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The proximal point algorithm (PPA for short) was suggested by Martinet [9] for solving 
convex optimization problems and was extensively developed by Rockafellar [10] in the con¬ 
text of monotone variational inequalities. The main idea of this method consists of replacing 
the initial problem with a sequence of regularized problems, so that each particular auxil¬ 
iary problem can be solved by one of the well-known algorithms. Along with the (DCA), a 
number of proximal point optimization schemes have been proposed in [11-14] to minimize 
differences of convex functions. Although convergence results for the (DCA) and the proxi¬ 
mal point algorithms for minimizing differences of convex functions have been addressed in 
some recent research, it is still an open research question to study the convergence analysis 
of algorithms for minimizing differences of functions in which convexity is not assumed. 

Based on the method developed recently in [15-17], we study a proximal point algorithm 
for minimizing the difference of nonsmooth functions in which only the second function 
involved is required to be convex. Under the main assumption that the objective function 
satisfies the Kurdyka - Lojiasiewicz property, we are able to analyze the convergence of the 
algorithm. Our results further recent progress in using the Kurdyka - Lojiasiewicz property 
and variational analysis to study nonsmooth numerical algorithms pioneered by Attouch, 
Bolte, Redont, Soubeyran, and many others. The paper is organized as follows. In Section 
2, we provide tools of variational analysis used throughout the paper. Section 3 is the main 
section of the paper devoted to the generalized proximal point algorithm and its convergence 
results. Applications to trust-region subproblems and nonconvex feasibility problems are 
introduced in Section 4. 

2 Tools of Variational Analysis 

In this section, we recall some basic concepts and results of generalized differentiation for 
nonsmooth functions used throughout the paper; see, e.g., [18-21] for more details. We 
use R n to denote the n - dimensional Euclidean space, (•, ■) to denote the inner product, 
and || • || to denote the associated Euclidean norm. For an extended-real-value function 
/ : M n —>- R U {+oo}, the domain of / is the set 

dom/ = [i £ K” : f(x) < +oc}. 

The function / is said to be proper if its domain is nonempty. 

Given a lower semicontinuous function / : M n —> RU{+oo} with x € dom/, the Frechet 
subdifferential of / at x is defined by 

B f /(i) = L R» : lim inf /(*)-/(*)-(<>.*-*) > 0 \ 

( x->x ||X — X|| J 

We set d F f{x) = 0 if x f. dom/. Note that the Frechet subdifferential mapping does not 
have a closed graph, so it is unstable computationally. Based on the Frechet subdifferential, 
the limiting/Mordukhovich subdifferential of / at x £ dom / is defined by 

d L f(x ) = Lim sup d F f(x) = {» £ 1" : 3 x k -4 x,v k £ d F f(x k ),v k -»• v}. 

A- 

x—rx 
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where the notation x A x means that x —>■ x and f(x) —> f(x). We also set d L f(x) = 0 
if x dom /. It follows from the definition the following robustness/closedness property of 
d L f: 

l?; £ M n : 3 x k -4 x,v k -t € <9 L /(x fc ) j = 5 L /(x). 

Obviously, we have d F f(x) C d L f(x ) for every x £ M n , where the first set is closed and 
convex while the second one is closed; see [22, Theorem 8.6, p 302], If / is differentiable at x, 
then d F (x ) = {V/(x)}. Moreover, if / is continuously differentiable on a neighborhood of 
x, then d L f(x) = {V/(x)}. When / is convex, the Frechet and the limiting subdifferentials 
reduce to the sub differential in the sense of convex analysis: 

df(x) = {x £ M n : (v,x — x) < f{x) — f(x), V x £ M n }. 

For a convex subset Q of M n and x £ f2, the normal cone to Q at x is the set 


N(x; II) = {v £ M n : (v, x — x) < 0, V x £ S3}. 


This normal cone can be represented as the subdifferential at the point under consideration 
of the indicator function: 


5(x] fl) 


0 if x £ n, 
Too if x ^ Q, 


i.e., N(x\£l) = d5(x\£l). We use the notation dist(x;fl) to denote the distance from x to 
Q, i.e., dist(x; Q) = inf xG n ||x — x||. The notation Pn(x) = {w £ II : ||x — xD|| = dist(x; 12)} 
stands for the projection from x onto 12. We also use dn(x) for dist(x; 12) where convenience. 

Another subdifferential concept called the Clarke subdifferential was defined in [18] 
based on generalized directional derivatives. The Clarke sub differential of a locally Lipschitz 
continuous function / around x can be represented in terms of the limiting subdifferential: 


d c f{x) = cod L f(x). 


Here cof2 denotes the convex hull of an arbitrary set 12. 


Proposition 2.1 ([22, Exercise 8.8, p. 304]). Let f = g+h where g is lower semicontinuous 
and let h is continuously differentiable on a neighborhood of x. Then 

d F f(x) = d 1, g(x) + Vh(x) and d L f{x) = d L g(x) T Vh(x). 

Proposition 2.2 ([22, Theorem 10.1, p. 422]). If a lower semicontinuous function f : 
M n —> R U {Too} has a local minimum at x £ dom/, then 0 £ d F f(x) C d L f(x). In the 
convex case, this condition is not only necessary for a local minimum but also sufficient for 
a global minimum. 

Proposition 2.3 Let h : M n —> R be a finite convex function on M n . If y k £ dh(x k ) for all 
k and {x k } is bounded, then the sequence {y k } is also bounded. 
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Proof The result follows from the fact that h is locally Lipschitz continuous on R" and [22, 
Definition 5.14, Proposition 5.15, Theorem 9.13]. □ 

Following [15,16], a lower semicontinuous function /: R" —>• R U {+oo} satisfies the 
Kurdyka - Lojasiewicz property at x* € dom d L f if there exist v > 0, a neighborhood V of 
x*, and a continuous concave function <p : [0, z/[—>• [0, +oo[ with 

(i) p(0) = 0. 

(ii) ip is of class C 1 on ]0, v[. 

(iii) p' > 0 on ]0, u[. 

(iv) For every x € V with f(x*) < f(x) < f(x*) T u, we have 

P' (/(®) - /(®*)) dist (0, d L f(x )) > 1. 

We say that / satishes the strong Kurdyka - Lojasiewicz property at x* if the same assertion 
holds for the Clarke subdifferential d c f(x). 

According to [15, Lemma 2.1], a proper lower semicontinuous function / : R n —>• R U 
{Too} has the Kurdyka - Lojasiewicz property at any point iel" such that 0 ^ d L f(x). 
Recall that a subset Ll of R n is called semi-algebraic if it can be represented as a finite union 
of sets of the form 


{x € R n : Pi(x) = 0, qi(x ) < 0 for all i = 1,..., m}, 

where pi and qi for i = l,...,m are polynomial functions. A function / is said to be 
semi-algebraic if its graph is a semi-algebraic subset of R n . It is known that a proper 
lower semicontinuous semi-algebraic function always satisfies the Kurdyka - Lojasiewicz 
property; see [15,23]. In a recent paper, Bolte et al. [23, Theorem 14] showed that the 
class of definable functions , which contains the class of semi-algebraic functions, satisfies 
the strong Kurdyka - Lojasiewicz property at each point of dom<9 c /. 

3 A Generalized Proximal Point Algorithm for Minimizing 
Differences of Functions 

We focus on the convergence analysis of a proximal point algorithm for solving nonconvex 
optimization problems of the following type 

mm{f(x) = g 1 (x)+g 2 {x)-h(x): x € R n |, (3.1) 

where gi(x): R n -> 1U {Too} is proper and lower semicontinuous, g 2 {x)\ R ra —> R is 
differentiable with L - Lipschitz gradient, and h : R n —>• R is convex. The specific structure 
of (3.1) is flexible enough to include the problem of minimizing a smooth function on a 
closed constraint set: 

min{g(x) : x G 0}, 

and the general DC problem: 

min {/(a;) = g(x) — h(x) : x € R n }, (3.2) 
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where g: M n —> MU{-foo} is a proper lower semicontinuous convex function and h: M n —>• M 
is convex. 

It is well-known that if x £ dom / is a local minimizer of (3.2), then 

dh(x) C dg{x ). (3.3) 

Any point x £ dom / that satisfies (3.3) is called a stationary point of (3.2), and any point 
x £ dom / such that dg(x ) D dh(x ) / 0 is called a critical point of this problem. Since 
h is a finite convex function, its subdifferential at any point is nonempty, and hence any 
stationary point of (3.2) is a critical point; see [5, 24, 25] and the references therein for more 
details. 

Let us recall below a necessary optimality condition from [26] for minimizing differences 
of functions in the nonconvex settings. 

Proposition 3.1 ([26, Proposition 4.1]) Consider the difference function f = g — h, where 
g : M n ->1U {+oo} and h : M n — > M are lower semicontinuous functions. If x £ dom f is 
a local minimizer of f, then we have the inclusion 

d F h(x) C d F g(x). 

If in addition h is convex, then dh(x ) C d L g{x). 

When adapting to the setting of (3.1), we obtain the following optimality condition. 

Proposition 3.2 If x £ dom / is a local minimizer of the function f considered in (3.1), 
then 

dh(x ) C d L g l {x) + Vg 2 (x). (3.4) 

Proof The assertion follows from Proposition 2.1 and Proposition 3.1. □ 

Following the DC case, any point x £ dom / satisfying condition (3.4) is called a sta¬ 
tionary point of (3.1). In general, this condition is hard to be reached and we may relax it 
to 

[d L gi (x) + Vfif 2 (s)] n dh(x ) ± 0 (3.5) 

and call x a critical point of /. Obviously, every stationary point i is a critical point. 
Moreover, by [26, Corollary 3.4] at any point x with g\[x) < +oo, we have 

d L {gi + - h)(x) C d L gi(x) + Vg 2 {x) - dh(x). 

Thus, if 0 £ d L f(x), then a: is a critical point of / in the sense of (3.5). The converse is not 
true in general as shown by the following example. Consider the functions below 

f{x) = 2\x\ + 3x, g\(x) = 3|x|, g 2 (x) = 3x, and h(x) = |x|. 

In this case, x = 0 satisfies (3.5) but 0 ^ d L f{x ) since <95 i(0) = [—3,3], V^O) = 3, 
dh{ 0) = [—1,1] and df( 0) = [1,5]. However, it is easy to check that these two conditions 
are equivalent when h is differentiable on R n . 
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We recall now the Moreau/Moreau- Yoshida proximal mapping for a nonconvex function; 
see [22, page 20]. Let g : M n ->1U {+ 00 } be a proper lower semicontinuous function. The 
Moreau proximal mapping with regularization parameter t > 0, proxf : M n —>• 2 Rn , is 
defined by 

proxf (x ) = argmin |g(u) + —\\u — x|| 2 : u £ R n |. 

As an interesting case, when g is the indicator function <5(-; fi) associated with a nonempty 
closed set Q, proxf (x) coincides with the projection mapping. 

Under the assumption inf xe R« g(x) > — 00 , the lower semicontinuity of g and the co- 
ercivity of the squared norm imply that the proximal mapping is well-defined; see [27, 
Proposition 2.2]. 

Proposition 3.3 Let g : M n -> lU {+ 00 } be a proper lower semicontinuous function 
with inf x6 iRn g(x) > — 00 . Then, for every t £ (0,+oo), the set proxf (x) is nonempty and 
compact for every x £ M n . 


We now introduce a new generalized proximal point algorithm for solving (3.1). Let us 
begin with the lemma below regarding an upper bound for a smooth function with Lipschitz 
continuous gradient; see [28,29]. 


Proposition 3.4 If g : 

then 


is a differentiable function with L - Lipschitz gradient, 


L, 


g{y) < g{x) + {Vg{x),y - x) + ^\\y - x || 2 for all x, y £ 


(3.6) 


Let us introduce the generalized proximal point algorithm (GPPA) below to solve (3.1). 


Generalized Proximal Point Algorithm (GPPA) 

1. Initialization: Choose x° £ dom <71 and a tolerance e > 0. Fix any t > L. 

2. Find 

y k £ dh{x k ). 

3. Find x k+l as follows 


k+i ^ 91 / k V g 2 (x k )-y k 

x £ proxf lx- 


t 


(3.7) 


4. If ||x fc — x fc+1 || < e, then exit. Otherwise, increase k by 1 and go back to step 2. 


From the definition of proximal mapping, (3.7) is equivalent to saying that 


jfc+i 


£ argmin g±(x) — ( y k — S7g2(x k ),x — x k ) + -||x — x 


k i|2 


(3.8) 


Theorem 3.5 Consider the (GPPA) for solving (3.1) in which gi(x): -4lU {+ 00 } is 

proper and lower semicontinuous with inf x6 jjn g\(x) > — 00 , g 2 (x): M n —> M is differentiable 
with L - Lipschitz gradient, and h : M n —> M is convex. Then 
(i) For any k>l, we have 

f{x k ) - /(x fc+1 ) > ^|[x fc - x fc+1 || 2 . (3.9) 
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(ii) If a = inf f(x) > —oo, then lim f(x k ) = t > a and lim ||a: fe — a; fc+1 || = 0. 

xGR" k->+oo ~ fc->+oo 

(iii) If a = inf f(x) > —oo and {x fc } is bounded, then every cluster point of {x fc } is a 

xeR n 

critical point of f. 


Proof (i) By Proposition 2.1 and Proposition 4.3, it follows from (3.8) that 

y k - Vg 2 (x k ) £ d L gi (x k+1 ) + t (x k+1 - x k ) . 


Since y k £ dh(x k ), 
From (3.8), we have 


h(x k+1 ) > /i(x fc ) + (y k ,x k+1 - x k ). 


9 i(x k ) > gi(x k+1 ) - (y k - V 52 (x fc ),x fc+1 - x k ) + - x fc+1 || 2 . 

Adding (3.11) and (3.12) and using (3.6), we get 

gi (x k ) - h(x k ) > g i(x k+1 ) - h{x k+1 ) + {V g2 {x k ),x k+l - x k ) + f -\\x k - x fc+1 || 2 
> 9i(x k+1 ) - h{x k+1 ) + ( g 2 (x k+1 ) - 92 (x k ) - ^\\x k - o: fc+1 || 2 ^ 


(3.10) 

(3.11) 


(3.12) 


f -\\x k -x k+1 


This implies 

f{x k ) - f(x k+1 ) > t ~ I ^ 1 \\x k - x k+1 \\ 2 . 

Assertion (i) has been proved. 

(ii) It follows from the assumptions made and (i) that {f(xk)} is monotone decreasing and 
bounded below, so the first assertion of (ii) is obvious. Observe that 

m 2 2 

||x fc — x k+1 \\ 2 < -- (fix 1 ) — f(x m+1 )) < -- (fix 1 ) — a) for all m € N. 

‘ ^ t — -L/ t — 1 j 

k =1 

Thus, the sequence {||x A ’ — x fc+1 ||} converges to 0. 

(iii) From (3.8), for all x G M n , we have 

9 i(x k+1 ) - ( w k ,x k+1 - x k ) + f -\\x k+1 - x k \\ 2 < gi (x) - (w k ,x- x k ) + \\\x- x k \\ 2 , (3.13) 

where w k = y k — \7g 2 (x k ). Now suppose further that {x k } is bounded. Since h is finite 
convex function on M n , y k £ dh(x k ) and {x fc } is bounded, from Proposition 2.3, {y k } is also 
bounded. We can take two subsequences: {x ki } of { x k } and {y ke } of {y k } that converge to 
x* and y*, respectively. Because \\x kt — x kl+1 \\ —> 0 as t —> +oo, we deduce from (3.13) that 

limsup 5 i(x fc ^ +1 ) < gi(x) — (y* — S7g 2 (x*),x — x*) + -\\x — x*\\ 2 for all x £ R n . 

£->+oo 2 

In particular, for x = x*, we get 


limsupgi(x^ +1 ) < gi(x*). 
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Combining this with the lower semicontinuity of g\, we get 

lim g\{x kl+1 ) = g\(x*). 

l^f+OO 

From the closed property of the subdifferential mapping dh(-), we have y* £ dh(x*). It 
follows from (3.10) that there exists z kl+1 £ d L g\(x kl+1 ) satisfying 

|| y k * - Vg 2 (x k ‘) - z k ‘ +1 \\ = t\\x k ‘ - x ke+1 \\. 

By (ii) and the Lipschitz continuity of Vg 2 , 

lim z ke+1 =y* -Vg 2 (x*) := z*. 

£—>•+00 

Thus, x ke+1 x*, z kt+1 £ d L gi(x ke+1 ), z kt+1 —> z* as £ —> + 00 , it follows from the 
robustness of limiting subdifferential that 2 * £ d L g\(x*). Therefore, 

y* £ [d L gi(x*) + V g 2 (x*)\ n dh{x*). 

This implies that x* is a critical point of / and the proof is complete. □ 

Proposition 3.6 Suppose that inf^gRn f(x) > — 00 , f is proper and lower semicontinuous. 
If the (GPPA) sequence {x fc } has a cluster point x*, then lim f(x k ) = f(x*). Thus, f 

fc—>•+OO 

has the same value at all cluster points of {x k }. 

Proof Since inf xe Rn /(x) > — 00 , it follows from (3.9) that the sequence of real numbers 
{f(x k )} is non-increasing and bounded below. Thus, lim f(x k ) = i* exists. If { x k is a 

&:—>■+00 

subsequence converging to x*, then by the lower semicontinuity of /, we have lim inf f(x ke ) > 

£->+00 

f(x*). Observe from the structure of / that dom / = domgi. Since g 2 and h are continuous, 
/ is proper and lower semicontinuous if and only if g\ is proper and lower semicontinuous. 
To prove the opposite inequality, we employ the proof of (iii) of Theorem 3.5 and get 

lim sup f(x ke ) = lim sup (g\{x kl ) + g 2 {x kl ) — h(x kl )\ 

•£—>•+00 •£—>•+OO ' ' 

< lim sup g\(x ke ) + limsup 52 (® fc£ ) — lim inf h{x kl ) 

£—>■+00 £—>+00 £—>+00 

< 9i(x*) + 92 {x*) - h(x*) = f(x*). 

Combining this with the uniqueness of limit, we have I* = f(x*). The proof is complete. □ 

Remark 3.7 (i) If g is also convex, we can get a stronger inequality than (3.9) and relax 
the range of the regularization parameter t. Indeed, using definition of the subdifferential 
in the sense of convex analysis in (3.10), we have 

(: y k - v 52 (x fc ) - t(x k+1 - x k ),x k - X k+1 ) < 9l (x k ) - gi (x k+1 ). 
h(x k+1 ) > h(x k ) + (y k ,x k+1 ~x k ). 


Since y k £ dh(x k ), 


Adding these inequalities and using (3.6) give 

f(x k )-f(x k+1 )> (t-^j ||x fc -x fc+1 || 2 . 

Thus, we can choose t > ■§ instead of t > L as before. 

(ii) When h(x) = 0, the (GPPA) reduces to the proximal forward - backward algorithm 
for minimizing f = gi + g 2 considered in [30]. If h(x) = 0 and g\ is the indicator function 
£(•; fi) associated with a nonempty closed set fi, then the (GPPA) reduces to the projected 
gradient method (PGM) for minimizing the smooth function <72 on a nonconvex constraint 
set Q: 

x k+1 = P Q ^ x k - ^Vg 2 (x k )^j • 

(iii) If (72 = 0, then the (GPPA) reduces to the (PPA) with constant stepsize proposed in 
[11,31], 

In the theorem below, we establish sufficient conditions that guarantee the convergence 
of the sequence {cc*,} generated by the (GPPA). These conditions include the Kurdyka - 
Lojasiewicz property of the function / and the differentiability with Lipschitz gradient of 
h. In what follows, let C* denote the set of cluster points of the sequence {x fc }. We follow 
the method from [15,16]. 

Theorem 3.8 Suppose that inf xg Rn /(x) > —oo, and f is lower semicontinuous. Suppose 
further that V/i is L(h) - Lipschitz continuous and f has the Kurdyka - Lojasiewicz property 
at any point x £ dom/. If C* ^ 0, then the (GPPA) sequence {x fc } converges to a critical 
point of f. 

Proof Take any x* £ C* and a subsequence {x kl } that converges to x*. Applying Proposition 
3.6 yields 

lim f(x k )=t = f(x*). 

/c—>■+oo 

If f(x k ) = I* for some k > 1, then f(x k ) = f{x k+p ) for any p > 0 since the sequence 
{f(x k )} is monotone decreasing by (3.9). Therefore, x k = x k+p for all p > 0. Thus, the 
(GPPA) terminates after a finite number of steps. Without loss of generality, from now on, 
we assume that f(x k ) > £* for all k. 

Recall that the (GPPA) starts from a point x° £ dom g\ and generates two sequences 
{x fc } and {y k } with y k £ dh(x k ) = Vh(x k ) and 

y k ~ l - Vg2{x k - 1 ) - t(x k - x k ~ l ) £ d L gi (x k ). 

Thus, from Proposition 2.1 we have 

(y k ~ l - V.g 2 (x' fc -') - t(x k - x"- 1 )) + Vg 2 (x k ) — y k £ d L 9l (x k ) + \7g 2 (x k ) - \7h(x k ) = d L f(x k ). 
Using the Lipschitz continuity of V g 2 and V/i, we have 

|y*- 1 - Vg 2 {x k ~ l ) - t(x k - x k - k ) + Vg 2 (x k ) ~y k | = 

= (Vh^- 1 ) - Vh(x fc )) + (Vg 2 (x k ) - Vg 2 {x k ~ x )) - t(x k - x k ~ l ) 

< (L(h) +L + t) ll^" 1 - x k \\ < M\\x k ~ l - x fc ||, 
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where M := L(h ) + L + t. Therefore, 


dist (O; d L f{x k )) < MH ®*" 1 - z fc ||. (3.14) 

According to the assumption that / has the strong Kurdyka - Lojasiewicz property at x*, 
there exist v > 0 , a neighborhood V of ®*, and a continuous concave function ip : [0, z/[— »• 
[ 0 , +oo[ so that for all x £ V satisfying < /(®) < i* + 17 we have 

A (/(z) - ^*) dist (0; d L f(x)) > 1. (3.15) 

Let 5 > 0 small enough such that B(x*;<5) C V. Using the facts that lim = ®*, 

■£—>•+00 

lim \\x k+1 — x k \\ = 0 , lim f(x k ) = £*, and f(x k ) > for all &, we can find a natural 

£;—>•+00 k —>-+oo 

number IV large enough satisfying 


x N £ B(x*; 5), t < f{x N ) < t + 1 /, 


(3.16) 


and 


— ®*|| + 


| Af _ ™JV—11| 01 

1 4 " +7 <p(f(x N )-t)<^, 


(3.17) 


where 7 = > 0. We will show that for all k > N, x k £ B(x*;<5). To this end, we first 

show that whenever x k £ B(x*; 5) and i* < f(x k ) < i* + v for some k, we have 


\x k -x k+1 \\ < 


\ x k 1 _ fci 


+ 7 


V? 


x fc ) - 




„fc+i\ 


(3.18) 


Indeed, by (3.14), the concavity of ip, (3.15), and (3.9), we have 


M\\x k ~ l - x 


It follows that 


<P 


x k )-t)-ip 


x" ")-t 


> dist ( 0 ; d L f(x l 




k +1 
k 


x k ) -t)-ip (/(x fc+1 ) - l 
f(x k ) - f(x k+1 ) 


> dist ( 0 ; d L f(x k )) ip' ( f(x k ) - 


>iz±\\ x k - x ^w\ 




x k ) -e^-ip (/(x fc+1 ) - r) > - 


\ \\ x k _ x k+l 112 


ry 11 x k 1 x k | 


1 

> - 


\\x k - x k+1 \ 


\x k ~ 1 -x k \ 


(3.19) 


where the last inequality holds since > a — j for any positive real numbers a and b. This 
implies (3.18). 

We next show that x k £ B(x*; 5) for all k > N by induction. The claim is true for k = N 
by the construction above. Now suppose the assertion holds for k = N ,..., N + k — 1 for 
some k > 1, i.e., x N ,... ,x N+k ~ 1 £ B(®*;£). Since f(x k ) is a non-increasing sequence that 
converges to £*, our choice of N implies that £* < f(x k ) < l* + v for all k > N. In 
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particular, (3.18) can be applied for all k = N ,..., IV + k — 1. Using the estimation (3.18) 
for k = N ,..., N + k — 1, we have 
II N ~ 1 _ iV || 

II®" - x N+1 II < —-+ 7 [<P ( f(x N ) -t)-<p {f(x N+1 ) - r)], 

ll* K+1 - X N+2 || < £—1 + 7 w (/(/*') - <*) - v (/(A+ 2 ) - r)], 


miV+fc-1 _ | 


< 


—2 r j.N+k —li 


+ 7 


<P 


N+k—l\ 


-n-<p 


X 


N+k 


)-? 


Therefore, 




1=1 


|| T ^V—1 „JVj| ||„iV+fc—1 ry,N-\-k | 

x N+j _ x N+j -111 + IU_ x II _ \\± _ x I 

4 -^" -II 4 4 

l=i 


+ 7 


^ (/(**)-t)-^/^*)-** 


Making use of the non-negativity of ip, we get 
k 


Y\\x"+i -x»+i-'\\<± 

Y 11-3 

j=i 


x N 1 — x^l 


+ TP (f{x )~t) 


(3.20) 


It follows that 


| x W+/c _ ^.*11 < || X JV _ x *|| _|_^ \\ x N +j - x N+j ~ 1 \ 


4 

< - 

- 3 


l=i 

ll—ZV-l T iV| 

I N *11 I x ~ X 
\x — x H--- 


+ TP{f{x N )-C) 


< 5. 


Thus, x k € B(x*;<5) for all k > N. Since x k G B(x*,5) and t* < f(x k ) < l* + v for all 
k > N, it follows from (3.20) by letting k —1 +oo that Ylt^i ll® fc+1 — x k \\ < +oo. Therefore, 
{x k } is a Cauchy sequence and hence it is a convergent sequence. □ 

Below is another theorem which gives sufficient conditions that guarantee the conver¬ 
gence of the sequence { xk} generated by (GPPA). In contrast to Theorem 3.8, we require 
the differentiability with Lipschitz gradient of the function g\ + <72 instead of h along with 
the strong Kurdyka - Lojasiewicz property of /. In this case, without loss of generality, we 
can assume that gffx) = 0. In the next result, for convenience, we put g 2 ( x ) = 9 ( x ). 


Theorem 3.9 Consider the difference of functions f = g — h with inf x eR n f( x ) > — 00 . 
Suppose that g is differentiable and V g is L - Lipschitz continuous, f has the strong Kurdyka 
- Lojasiewicz property at any point x G dom/, and h is a finite convex function. If C* 0, 
then the (GPPA) sequence {x k } converges to a critical point of f. 


Proof The proof is very similar to that of Theorem 3.8, except a few adjustments. Note 
that / is locally Lipschitz continuous under the assumptions made since g is a C 1 function 
and h is a finite convex function. By (3.10), we have 

y k ~ l - t(x k - x 1 *- 1 ) = Vg(x k ) and y k - t(x k+1 - x k ) = Vg(x k+1 )- 
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This implies, 


y k - (y k ~ l - t{x k - x^" 1 )) = X7g(x k+1 ) - Vg(x k ) + t(x k+1 - x k ). 


Making use of the Lipschitz continuity of Vg yields 


( y k ~'-t(x k -X k - 1 )) 


Vg(x k+1 ) - Vg(x k ) + t(x k+1 - x k ) 
< {L + t)\\x k -x k+1 \\. 


On the other hand, 

y k - (y k ~ x - t(x k - x^ 1 )) € dh{x k ) - Vg(x k ) = d F {-f){x k ) C d c {-f){x k ). 


Since d c (—f)(x k ) = —d c 'f(x k ), we have 

dist (O ;d c f(x k )^j — dist (O; 9 c (-/)(x fc )) < (L + t) \\x k - x k+1 \\. 

Choose N as in (3.16) and (3.17) with 7 = ^jx-jp- instead of |^ as before. For all k large 
enough such that x k G B(x*; r) and £* < f{x k ) < £* + u, we have 


(L + t ) \\x k — x 


fc+ii 


<P 


x k ) - t ) -<^(/(x fc+1 )- 


> dist ( 0 ; d c f[x 




x k )-t)-ip 




> dist ( 0 ; d c f(x k )) ip' (f(x k ) - t) f(x k ) - f(x k+1 ) 


||^-x fc + 1 |l 2 . 


It follows that 


\x k - x fc+1 || < 7 




x k ) - 


<P A® 


fc+i'i 


(3.21) 


From this, the induction to prove that x k € B(x*; r) for all k > N can be carried out similarly 
to the proof of Theorem 3.8. Indeed, suppose the assertion holds for k = N ,..., N + k — 1 
for some k > 1, i.e., x N ,..., x N+k ~ 1 € B(x*; r). Observe that 

k 

\\x N+k -x*|| < |[x^ - X *II + ^2 Ik^ -1 - X N+j \\ 

j =1 

< ||x* - x*|| + 7^ W (/(^ JV+j_1 ) -r)-<p {f(* N+j ) - **)] 

1=1 

< llx^ — x*|| + 7</? (f(x N ) — •£*) < r. 

Thus, x k € B(x*;r) for all k > N. Since x k G B(x*,r) and < f(x k ) < £* + i/ for all 
k > IV, we can sum (3.21) from k = IV to some N\ greater than N and take the limit as 
IVi —>• + 00 , showing that YlkLi — x fc || < + 00 . This completes the proof. □ 

In the proposition below, we give sufficient conditions for the set of cluster points C* of 
the (GPPA) sequence {x^} to be nonempty. 
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Proposition 3.10 Consider the function f = g — h, where g = gi + g 2 in (3.1). Let {x fc } 
be sequence generated by the (GPPA) for solving (3.2). The set of critical points C* of {xk} 
is nonempty if one of the following conditions is satisfied: 

(i) For any a, the lower level set L< a := {x £ : f(x) < a} is bounded. 

(ii) liminf h(x) = +oo and liminf > 1. 

||a:|| —>-H-oo ||:c||—>-+oo ' ' 

Proof The conclusion under (i) follows directly form the facts that f{x k ) < /(x°) for all k 
and L<j( 3 . 0 ) is bounded. Now assume that (ii) is satisfied. Then there exist M > 1 and 
R > 0 such that g(x) > Mh(x ) for all x satisfying ||x|| > R. It follows that 

liminf f(x) = liminf [g{x) — h(x)\ > (M — 1) liminf h(x) = +oo. 

H^||—>-+CXD \\ x \\—>-+CXD ||a;|| —>-+CXD 

Thus, / is coercive. Combining this with the descent property of the sequence {/(x fc )}, we 
can conclude that {x fc } is bounded. □ 

It is known from [23, Corollary 16] and [15, Section 4.3] that a proper lower semicontin- 
uous semi-algebraic function / on M n always satisfies the Kurdyka - Lojasiewicz property 
at all points in dom df with </?(s) = cs for some 9 € [0,1 [ and c > 0. We now derive 
convergence rates of the (GPPA) sequence by examining the range of the exponent. 

Theorem 3.11 Consider the settings of Theorems 3.8 and 3.9. Suppose further that f is 
a proper closed semi-algebraic function so that the function (p in the Kurdyka - Lojasiewicz 
property has the form (p(s) = csfor some 9 £ [0,1[ and c > 0. Then we have the 
following conclusions. 

(i) If 9 = 0, then the sequence {x fc } converges in a finite number of steps. 

(ii) If 0 < 9 < then there exist p > 0 and q £ (0,1) satisfying 

ii k * ii ^ k 
||x — x || < gq . 

(iii) If 7} < 9 < 1, then there exists g > 0 such that 

b 1-8 

||ar - x*|| < gk 1 -™. 

Proof For each k > 1, set = Yl,p=k ll xP+1 — * p ll an d se t Ik = f{x k ) — (*■ It is obvious 
from the triangle inequality that \\x k — x*|| < A&. From Kurdyka - Lojasiewicz property 
with the special form of tp, we have 

c(l - 9)if 6 dist (0; d L f(x k )) > 1. (3.22) 

From the proof of Theorem 3.9, if Vg is L - Lipschitz continuous, then 

dist (O; d L f(x k )^j <{L + t) ||x fc+1 - s fc ||, 

for all sufficiently large k. Combining this with (3.21) yields 

A fc < 7 </j(4) < 7^(4-i) = 7 dlZi < 7 c[(L + t)c( 1 - 0)}~ \\x k - 1|“, 
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where 7 = 2 ^+^ . 

In the case of Theorem 3.8 where V/i is L(h ) - Lipschitz continuous, we have 


dist ( 0 ; d L f (x k )^J < M\\x k — x k 11 
for all sufficiently large k, where M = L(h) + L + t. It follows from (3.20) that 


A k < 


I q,k x k ~^ I 


+ 7¥>(4) 


< 


_ sgk 1 | 


4^7 1-0 , , i 1-0 

+ — [Mc(l - 9)} — \\x k - x k ~ l 1| -r 
3 


where 7 = Thus, in both cases it always holds that 

A* < Ci(Afc_! — A*,) + C2 (Afc_! — A*,) 0 , 

for some C \, C 2 > 0. The result now follows from the proof of [32, Theorem 2]. 


□ 


4 Examples 

Trust-Region SubProblem. Consider the trust-region subproblem 

min < cj)(x) = -x T Ax + b T x : ||x|i < r 2 


(4.1) 


where A is an n x n real symmetric matrix and b € M n is given. Since A is not required 
to be positive-semidefinite, (4.1) is a nonconvex optimization problem. Let E = {x £ M n : 
||x|| < r} and define the function 

f(x) = (j>{x) + 6(x; E), x € M n . 

The trust-region subproblem (4.1) can be solved by the (DCA) with the following DC 
decomposition / = g — h with 

g(x) = -p||x|| 2 + b T x + <5(x; E) and h(x) = -x 1 (pi — A)x, 

where p is a positive number such that pi — A is positive-semidefinite; see [6]. The conver¬ 
gence analysis of the (DCA) sequence for solving (4.1) was proved in [34]. 

Define 

92 (x) = -p\\xf + b T x and gi(x) = 5(x\ E). 

In this case, 52 and h have Lipschitz gradient with Lipschitz constants L = p and L(h ) = 
A ma x(pl ~ A), respectively. Applying the (GPPA) for (4.1), we have y k = \7h(x k ) = 
(pi — A)x k and 

y k - Vg 2 (x k+1 ) - t (x k+1 - € d gi (x k+1 ). 

This implies 

y k + tx k - b € (t + p)x k+1 + N(x k+1 ] E). 


Thus, 


x k+1 = P E 


{jT-^ (t + p)xt - Axk - b )) 
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Proposition 4.1 Consider the trust-region subproblem (4.1). Then C* ^ 0 and the (GPPA) 
sequence { x k } converges to a critical point of f = g\ + g 2 — h. 


Proof We only need to verify that all assumptions of Theorem 3.8 are satisfied in this 
particular case. Note that f(x) = f>(x) + 5(x;E). Obviously, inf xe iRn f[x) > —oo and 
C* 0. Let us show that / is a senri-algebraic function. Note that 

E = {x G M n : p{x) < r 2 }, 

where p is the polynomial p(x) = x 2 . Thus, Pi is a semialgebraic set, which implies 

that its associated indicator function is a semi-algebraic function; see, e.g., [15]. 

It is also straightforward that f is also a semi-algebraic function since its graph 

gph(/> ={(i,i/)el"xl: x T Ax + b T x — y = 0 } 


is a semi-algebraic set. It follows that / is a semialgebraic function as it is the sum of two 
semi-algebraic functions; see, e.g., [15]. Therefore, / satisfies the Kurdyka - Lojasiewicz 
property. Obviously, h has Lipschitz continuous gradient. We have shown that all assump¬ 
tions of Theorem 3.8 are satisfied and the conclusion follows from Theorem 3.8. □ 

Nonconvex Feasibility Problems. In this part, we show how the (GPPA) can be applied 
to solve nonconvex feasibility problems. Let A and B be two nonempty closed sets in M n . 
It is implicitly assumed that A and B are simple enough so that the projection onto each 
set is easy to compute. The feasibility problem asks for a point in A fl B. It is clear that 
A n B ^ 0 if and only if the following optimization problem has the zero optimal value: 

min|irf^(x) : x 6 a|. (4.2) 

This problem is of the type (3.1) with the objective function f(x) = gi(x) + g 2 (x) — h(x), 
where 

gi{x) = 5{x]A), g 2 (x) = -\\x\\ 2 ,h(x) = - (\\x \\ 2 - d 2 B (x)) . 

Obviously, the function g 2 is differentiable with L— Lipschitz gradient where L = 1. We 
have 

h(x) = ]^\\x\\ 2 - ^inf{||x || 2 + ||y|| 2 - 2(x,y) : y € B} 

\\ v \\ 2 

= sup{(x, y) - ~Y~ : y eB} 

= sup {fy(x) : y € B}, 

where f y (x) = (x,y) — Therefore, h is a pointwise supremum of a collection of affine 
functions so it is a convex function. Denote S(x) = {y € B : f y (x) = h(x)}. We have 

S{x) = {y € B : \\x - y\\ 2 = d 2 B {x)} = P B (x). 


Since B is a nonempty closed subset of M n , the set S(x) = Pb{x ) is nonempty and compact 
for any x € M n . By [33, Theorem 3, p. 201], we have 


dh(x) = co 


U d -fy( x ) 

yeS(x) 



co Pb(x). 
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Making use of Proposition 3.1, we now can state the necessary condition for a local minimum 
of (4.2). 

Proposition 4.2 If x G A is a local optimal solution of (4.2), then 

Pb{x) C x + N l (x] A), (4.3) 

where N L (x]A) is the limiting normal cone to A at x defined by N L (x]A) = d L 5(x;A). 

Note that the optimality condition (4.3) is not sufficient to ensure that x is a local minimizer 
of (4.2) as shown in the next example. 

Example 4.3 Consider the following subsets of M 2 : 

A = {(xi,X 2 ) G M 2 : X 2 > 1} and B = }(xi,X 2 ) G M 2 : X 2 < ax 2 }, 

where a < |. Put x = (0,1) G A. Since a <\, the system 

fx 2 + (x 2 - l) 2 < 1, 

}x 2 — ax 2 < 0 , 

has a unique solution (xi,X 2 ) = (0,0). This implies Pb(x) = {(0,0)} and ds{x) = 1. 
Obviously, x satisfies condition (4.3) since 

Pb(x) = {(0,0)} C {( 0 , 7 ) : 7 < 1} = x + N(x; A). 

However, for any neighborhood U of x, there always exists e > 0 small enough such that 
x e = (e, 1) G U and 

^s(^e) < 1 — ae 2 < 1 . 

Thus, z cannot be a local minimizer of (4.2). 

Based on the (GPPA), we now propose the following simple algorithm for solving (4.2). 
For a given initial point x° G A , the (GPPA) sequence {x k } with the starting point x° is 
defined by 

x k+1 G P A ((1 - \)x k + ^ , (4.4) 

where y k is an element chosen in co PB(x k ). Note that, this scheme is different from some 
other well-known methods such as the alternating projection algorithm or the averaged pro¬ 
jection algorithm. Moreover, it cannot be obtained from the proximal forward - backward 
schemes in [27,30]. 

Theorem 4.4 Let A and B are nonempty closed sets in M” and let t > 1. Then the 
sequence {x fc } C A satisfies the following: 

(i) For any k > 1, 

4 (x fc ) - 4 (x fc+1 ) > 2 (t - l)||x fc - x fc+1 || 2 . 

(ii) lim ||x fc — x fc+1 || = 0 . 
k —»-+oo 

(in) If {x fc } is bounded, then every cluster point is a critical point of f = d(-; A) + 4(')- 
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Proposition 4.5 Let A and B are nonempty closed sets in R n such that both of them are 
semi-algebraic sets and B is convex. Suppose further that either A or B is bounded. Then 
the sequence {x fc } generated by the (GPPA) converges to a critical point of (4.2). 

Proof As A is a semi-algebraic set, the indicator function <!(■; A) is a senri-algebraic function. 
On the other hand, B is also semi-algebraic, son-f \d 2 B {x) is also a semi-algebraic function; 
see [30, Lemma 2.3]. Therefore, f{x) = S(x] A) + ^ d? B (x ) is a senri-algebraic function. If 
B is closed and convex, it is well known that the function x H>• d 2 B {x) is smooth with 1 
- Lipschitz continuous gradient; see [35, Corollary 12.30]. The result now follows directly 
from Theorem 3.8 since the boundedness of {a#} is ensured by the coercivity of / under 
the assumption that either A or B is bounded. □ 

5 Concluding Remarks 

Based on recent progress in using the Kurdyka - Lojasiewicz property and variational anal¬ 
ysis in analyzing nonsmooth optimization algorithms, we introduce and study convergence 
analysis of a proximal point algorithm for minimizing differences of functions. We are 
able to relax some convexity in the classical DC programming to deal with a more general 
class of problems. The results open up the possibility of understanding the convergence 
of the (DCA) and other algorithms for minimizing differences of convex functions used in 
numerous applications. 
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